444 Lecture 6
2024-02-01
| p | q | p & q | |
|---|---|---|---|
| A | |||
| B | |||
| C | |||
| Group |
Here are two methods the judges could use to resolve the case:
Method 1 delivers a win to the plaintiff, method 2 delivers a win to the defendant.
| p | q | p & q | |
|---|---|---|---|
| A | 1 | 1 | 1 |
| B | 1 | 0 | 0 |
| C | 0 | 1 | 0 |
| Group | 2/3 | 2/3 | 1/3 |
The average of some probabilities is a probability.
If everyone in the group has consistent probabilities, the group will have consistent probabilities.
The group probability is just the average of the probabilities of the members of the group.
By ‘average’ here, we’re talking about the arithmetic mean: add the probabilities up and divide by how many judges there are.
One way to spell this out: no reference to people in the definition.
Another way - if two people switch views entirely, end up with same result.
One way to spell this out: no reference to options in the definition. (Compare views that say don’t make a change unless super-majority approves.)
Another way - if everyone switches their views between two options, the position of those two options in the group probability switches.
To know the group view about the probability of p, you just need to know what each member thinks about p, and not about anything else.
The last three slides have described the features they call
I don’t want to undersell these advantages.
Until not that long ago, I thought the problem of how to get a group probability out of individual probabilities was a simple problem with a simple solution … this one.
But in this area we really aren’t allowed to have nice things.
According to probability function Pr, two propositions p and q are independent if any of the following, equivalent, conditions are met. (Caveat: weird things happen if probability of p or q is 0; set that case aside for now.)
\[ \begin{align*} Pr(p | q) &= Pr(p) \\ \frac{Pr(p \wedge q)}{Pr(q)} &= Pr(p) \\ Pr(p \wedge q) &= Pr(p)Pr(q) \end{align*} \]
| p | q | p & q | |
|---|---|---|---|
| A | 0.9 | 0.9 | 0.81 |
| B | 0.1 | 0.1 | 0.01 |
| Group | 0.5 | 0.5 | 0.42 |
The two propositions are independent according to each member of the group, but not according to the group as a whole.
Imagine that you know nothing about p, q, but you know that A and B are experts, and think you should defer to them.
How do you defer to both of them?
Natural way - by setting your probability to the average of theirs.
Problem: You’ll end up thinking something both of your experts reject, namely that p and q are not independent.
I’m going to use this heuristic a bit, and I just want to flag it because while I think it’s useful, you might think it’s where I go wrong.
When we are thinking about personal probabilities, we normally assume that learning goes by conditionalisation.
What that means is that the probability of a hypothesis H after learning evidence E is Pr(H | E). Or in symbols, where PrE is the probability after learning E:
\[ Pr_E(H) = Pr(H | E) = \frac{Pr(H \wedge E)}{Pr(E)} \]
A version of the case we’ve already seen shows that the following two claims are inconsistent:
Imagine that we’ve just learned that q is true.
This doesn’t affect A and B, because they think p and q are independent.
So the average of their probabilities is 0.5, and that’s arguably the group probability.
But …
Imagine that we’d:
That looks like it should give the same verdict, but it doesn’t.
The principle that Russell et al call Conditionalisation basically says the following two things should give the same verdict, at least if A and B have learned the same things since yesterday.
They note that there is one interesting rule that does satisfy this constraint.
It’s moderately tricky to state in full generality, and I’m just going to state a rough version of it, and leave it to more math-centric classes to get the details fully right.
(I’m not sure the paper gets the details fully right, in cases where A and B have opinions over the distribution of a continuous variable. The math here is actually rather tricky.)
If x and y are non-negative numbers, their geometric mean is:
\[ \sqrt{xy} \]
In general the geometric mean of some non-negative numbers x1, x2, …, xn is
\[ \sqrt[n]{x_1 x_2 \dots x_n} \]
I don’t think trying to go through the math of that in detail is worth it; if you didn’t get it straight away, nothing I say in 5-10 minutes will help. And the details don’t matter for what we’ll do next.
Instead I want to leave you with a puzzle.
Three excellent detectives, A, B, and C, are investigating a crime
They’ve each seen all the evidence, but you know they have very different approaches to thinking about crimes and cases like this.
You also know they are much better at crime solving than you are.
They are also each incredibly self-confident; once they’ve formed a view, knowing what the others think won’t change their view.
You ask each of them what they think, and they each say it’s 90% likely that the butler did it.
Question: How confident should you be that the butler did it?
This is easy - 90%.
After all, they agree about this, and they’re smarter/better informed/better at crime solving than you are.
More than 90%. Here’s why.
After hearing A say 90%, it would be natural to be 90% confident that the butler did it.
But then hearing B and C each say that is further evidence that the butler did it.
And when you get evidence like that, your probability should go up.
I have no idea - I think it’s a hard puzzle!
It’s not strictly speaking a math puzzle. Given some mathematical models of combining probabilities we can do some algebra to work out which model agrees with which answer.
As it turns out, arithmetic mean agrees with the first, geometric mean with the second, and there are plenty of other views out there that take one or other side.
But the question here is what we want our models to do, not I think one that we should leave to be answered by the models.
We’ll move from pooling probabilities to pooling preferences.
That is, we’re going to start talking about voting systems.